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Description 

Field of the Invention 

10001] This Invention relates to the field of nucleic acid regulatory elements that affect mRNA translation export, 
and stability. More specifically, the invention relates to the screening of 5' and 3' untranslated RNA sequences, the 
identification of RNA regulatory elements within these sequences, and the identification of compounds that modulate 
the function of these RNA regulatory sequences. 

Background 

[0002] While transcriptional controls regulate gene expression by influencing the rate of mRNA production, post- 
transcriptional mechanisms can also regulate gene expression by modulating the amount of protein produced from an 
mRNA molecule. For example, gene expression can be regulated by altering mRNA translation efficiency (Izquierdo 
and Cueza, Mol. Cell Biol. 1 7: 5255-5268. 1 997; Yang et al.. J. Biol. Chem. 272: 1 5466-73. 1 997). or by altering mRNA 
stability (Ross, Microbiol. Rev. 59: 423-50. 1995). Post-transcriptional control mechanisms appear to play an especially 
important role in the gene expression response to environmental factors, such as response to heat shock (Sierra et 
al., Mol. Biol. Rep. 19: 211-20. 1994). iron availability (Hentee et al., Proc. Natl. Acad. Sci. USA 93: 8175-82, 1996), 
oxygen availability (Levy et al., J. Biol. Chem. 271: 2746-53, 1996; McGary et al., J. Biol. Chem. 272: 8628-34! 1997)! 
and growth factors (Amara et al., Nucleic Acids Res. 21 : 4803-09, 1 993). 

[0003] Post-transcriptlonal regulatory elements may be present In the 5* and 3' mRNA untranslated regions (UTRs). 
At the 5' UTR, mRNA binding to ribosomes is generally the rate-limiting step in translation initiation (Mathews et al 
In: Translational Control, pages 1-30, Eds: Hershey et al., Cold Spring Harbor Laboratory Press, Cold Spring Harbor! 
NY. 1996). At the 3' UTR, regulatory elements may modulate mRNA translation and degradation, as well as mRNA 
transport and subcellular localization (Jackson, Cell 74: 9-14. 1 993). However, the nature of most UTR post-transcrip- 
tional elements remains pooriy understood. A method for efficiently characterizing these mRNA regulatory sequences 
would advance the discovery of compounds that modulate expression of therapeutically important proteins via regu- 
latory mRNA sites. 

Summary of the Invention 

[0004] We have discovered a method for constmcting libraries that are specifically biased for RNA regulatory sites. 
In the first aspect, the invention features a cDNA library consisting essentially of at least 1 00 different cDNA sequences 
that correspond to different mRNA untranslated region (UTR) sequences isolated and separate from adjacent mRNA 
coding sequences. Preferably, the cDNA sequences are cloned Into a vector system that can express the sequences 
and such a vector is also a feature of this Invention. This vector includes the following: a) a nucleotide sequence 
encoding an mRNA UTR sequence in operative linkage to a promoter, wherein the nucleotide sequence is derived 
from the cDNA library of the first aspect; b) a first reporter gene positioned for transcription upstream or downstream 
of the UTR-encoding nucleotide sequence; and c) a second, different reporter gene in operative linkage to a promoter 
but unassociated with the UTR-encodIng nucleotide sequence. Preferably, the reporter genes encode a fluorescent 
protein or cell surface mari<er protein. 

[0005] A second and related aspect of the Invention features a cDNA library, wherein the library is constructed by 
steps that include the following: a) purifying poly{A)+ RNA from total RNA; b) performing controlled, non-random en- 
zymatic digestion of AUG sequences in the poly{A)+ RNA; c) purifying the digested RNA to obtain the fragments 
containing the 5' end sequences; and d) synthesizing cDNA from the purified RNA obtained in step (c); wherein the 
library consists essentially of cDNA sequences con-espondlng to mRNA 5' untranslated region (UTR) sequences, iso- 
lated and separate from adjacent mRNA coding sequences. Preferably, the enzymatk: digestion is carried out usinq 
RNase H. 

[0006] In a third aspect, the invention features a cDNA library constructed by steps that Include the following- a) 
purifying poly(A)+ RNA from total RNA; b) synthesizing nucleic acid heteroduplexes from the poly(A)+ RNA. using 
degenerate primers that hybridize preferentially to the region surrounding and including the Initiation codon. where the 
heteroduplexes comprises the 5' end sequences of the RNA; c) purifying the heteroduplexes obtained in step (b) to 
obtain the fragments containing the 5' end sequences; and d) synthesizing cDNA from the purified heteroduplexes 
obtained in step (c); wherein the library consists essentially of cDNA sequences corresponding to mRNA 5' untranslated 
(UTR) sequences, isolated and separate from adjacent mRNA coding sequences. 

[0007] In one embodiment of any of the above three aspects of the invention, the cDNA library consists essentially 

of cDNA sequences corresponding to mRNA untranslated region sequences, isolated in intact form. 

[0008] In preferred embodiments of the second or third aspects of the invention, the 5' sequence purification is carried 



EP1 176 196 A1 



out using a cap binding protein, for example, an elF4E fusion protein or an antibody to the 5' cap, and the cDNA 
sequences are doned into a vector system that can express the sequences. This vector includes the following: a) a 
nucleotide sequence encoding an mRNA UTR sequence In operative linkage to a promoter, wherein the nucleotide 
sequence Is derived from the cDNA library of the second or third aspect; b) a first reporter gene positioned for tran- 
scription upstream or downstream of the UTR-encoding nucleotide sequence; and c) a second, different reporter gene 
in operative linkage to a promoter but unassocfated with the UTR-encoding nucleotide sequence. Preferably, the re- 
porter genes encode a fluorescent protein or cell surface marker protein. 

[00091 A related fourth aspect of the invention is a cDNA library, wherein the library is constructed by steps that 
Include the following: a) purifying poIy(A)+ RNA from total RNA; b) perfomning random digestion on the poly(A)+ RNA; 
c) purifying the digested RNA to obtain poly(A) containing fragments; and d) synthesizing cDNA from the purified RNA 
obtained in step (c); wherein the library consists essentially of cDNA sequences con-esponding to 3' UTR sequences, 
Isolated and separate from adjacent mRNA coding sequences. 

[0010] A cDNA library is also featured in the fifth aspect of the Invention. This cDNA library is constructed by steps 
that Include the following: a) purifying poly(A)+ RNA from total RNA; b) loading the poly(A)+ RNA with ribosomes; and 
c) perfomiing reverse transcription on the loaded poly(A)+ RNA using an ollgo(dT) primer and polymerase; 
wherein the library consists essentially of cDNA sequences corresponding to 3' UTR sequences, isolated and separate 
from adjacent mRNA coding sequences. Preferably, the cDNA sequences of the libraries of the fourth or fifth aspects 
are cloned into vector systems that can express the sequences, and such vectors are also a feature of this Invention. 
These vectors include the following: a) a nucleotide sequence encoding an mRNA UTR sequence In operative linkage 
to a promoter, wherein the nucleotide sequence is derived from the cDNA library of the fourth or fifth aspect; b) a first 
reporter gene positioned for transcription upstream or downstream of the UTR-encoding nucleotide sequence; and c) 
a second, different reporter gene in operative linkage to a promoter but unassociated with the UTR-encoding nucleotide 
sequence. Preferably, the reporter genes encode a fluorescent protein or cell surface marker protein. 
[001 1 ] I n one embodiment of the fourth or fifth aspect of the Invention, the cDN A library consists essentially of cDNA 
sequences corresponding to 3' untranslated region sequences, isolated In Intact fomi. 

[0012] A sixth aspect of the Invention provides a method of identifying a regulatory UTR sequence that Includes the 
following steps: a) transfecting a plurality of host cells with a plurality of vectors of the present Invention, wherein the 
host cells are transfected with different UTR sequences; b) sorting cells on the basis of the ratio between expression 
of the first reporter gene and the second reporter gene; c) identifying the cells of step a) that have skewed expression 
ratios as compared to the population of cells of step (a) as a whole, or as compared to cells transfected with a vector 
that encodes the first and second reporter gene, but lacks the con'esponding UTR sequence; and d) sequencing the 
UTR expressed in the identified cells. Preferably, the gene expression Is detected by emission of fluorescence and the 
cells are sorted by a fluorescence activated cell sorter. 

[001 3] The seventh and final aspect of the Invention features a cell transfected with any of the vectors of the present 
Invention. 

[0014] By "different mRNA untranslated region (UTR) sequences" or "different UTR sequences" Is meant sequences 
that differf rom each other In that they are derived from different mRNA species. As used herein, mRNA UTR sequences 
that are products of alternated splicing are considered to be different mRNA UTR sequences. 
[0015] By "controlled, non-random enzymatic digestion of AUG sequences" Is meant preferentially digesting mRNA 
at the site of AUG sequences, for example, using RNase H and a mixture of degenerate AUG-complementary oligo- 
nucleotide 7-mers, under conditions that require hybridization of more than 5 consecutive base pairs for RNase sub- 
strate recognition. To preferentially digest the Initiatlon-AUG sequences In an mRNA population, the 7-mers in the 
AUG-complementary oligonucleotide mixture used have frequencies of A. C, G, and T at each position that are com- 
plementary to the frequencies of A, C, G, and U occumng in all known vertebrate mRNA sequences between the -3 
and +4 position (where +1 is the first nucleotide of the coding sequence) (see, e.g., Table 1). 
[0016] By "UTR sequences isolated and separate from adjacent mRNA coding sequences" is meant the following: 
1) 5' UTR sequences that begin at the 5' end of a transcribed mRNA and extend up to, but do not include, the translation 
AUG initiation site; and 2) 3* UTR sequences that begin at the mRNA nucleic acid in the position 3' adjacent to the 
translation temriinatlon site and extend to the poly(A) tail of the transcribed mRNA. Preferably, the UTR sequences are 
isolated In intact fonn. 

[0017] By "random digestion" of poly(A)+ RNA Is meant RNase digestion using, for example, RNase H and random 
primers to digest the RNA Into smaller fragments at random sites. 

[0018] By "loading poly(A)+ RNA with ribosomes" is meant contacting the RNA population with ribosomes, for ex- 
ample, in a rabbit reticulocyte lysate, to allow for loading of the ribosomes onto the RNA. To maximize ribosome loading, 
a chemical that prevents ribosome runoff, for example, cycloheximlde, can be included. 
[0019] By a "plurality" is meant more than one. 

[0020] By "skewed expression ratios" Is meant a change in the ratio of expression of a first reporter gene that Is 
associated with a specific UTR to expression of a non-UTR associated second reporter gene, as compared to the ratio 
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of expression of the first reporter gene that is not associated with the same UTR compared to expression of the non-UTR 
associated second reporter gene. 

[0021 ] The screening assay and the 5' and 3' mRNA untranslated region (UTR) biased cDNA libraries of the present 
invention have a number of advantages. The biased UTR libraries provide a collection of UTR sequences that are 
Isolated and separated from any adjacent coding sequences. Thus, screening these libraries allows opportunity to 
screen essentially complete UTR sequences without Interference from coding sequences. In addition, the quantity of 
sequences screened and the specificity of output can be modulated by controlling conditions that regulate the number 
of different plasmlds that enter each cell. In most circumstances, the ideal number of plasmids per cell would be limited 
to one, thereby reducing signal dilution and the occurrence of false negative results. 

[0022] Other features and advantages of the invention will be apparent from the detailed description thereof and 
from the claims. 

Description of the Figures 

[0023] Fig. 1 demonstrates RNase H digestion of a control RNA sequence using specific or partially degenerate 
oligodeoxynucleotide 7-mers, under conditions that allow hydrolysis only if 6 or more consecutive base pairs are hy- 
bridized (compare lanes 4 and 5). 

[0024] Fig. 2 demonstrates RNase H digestion of a control sequence using two different sequence specific oiigode- 
oxynucleotldes, under conditions that allow hydrolysis only if 7 consecutive base pairs are hybridized (see lane 3). 
[0025] Fig. 3 shows RNase H digestion of poly(A)+RNA using a partially degenerate oligonucleotide 7.mer, under 
conditions that allow hydrolysis only if 7 consecutive base pairs are hybridized. The number of hydrolysis sites can be 
limited, even after extended Incubation (compare lanes 6 and 7). 
[0026] Fig. 4 Illustrates limited reverse transcription of 3' UTR sequences. 

Detailed Description 

[0027] The practice of the present Invention employs conventional techniques In biochemistry, molecular biology, 
microbiology, and related fields that are known to those skilled in the art. These techniques are fully explained in the 
literature (see, e.g., Manlatis et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press 
(1982); Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Uboratory 
Press (1989); and Ausubel, et al., Current Protocols in Molecular Biology, John Wiley & Sons (1987-1996 ed.) 

Construction of UTR Libraries 

[0028] Poly(A)+ RNA is Isolated from total cellular RNA, according to standard protocol (Aviv and Leder Proc Natl 
Acad. Sci. USA 69: 1408-12, 1972). 

[0029] To construct 5' UTR biased libraries, poly(A)+ RNA is subjected to controlled, non-random enzymatic digestion 
followed by size selection. The enzymatic digestion of the poly(A)+ RNA is carried out. for example, using E. colt RNase 
H in the presence of a 7-mer oligodeoxynucleotide mixture, wherein the sequences of the ollgodeoxynucleotides have 
A, C, G, and T at frequencies of occurrence that are complementary to the frequencies of occurrence of A, C, G, and 
U In all known vertebrate mRNA sequences between the -3 and +4 positions of the mRNA (where position \ of the 
oligodeoxynucleotide is complementary to position +4 on the mRNA and position 7 Is complementary to position -3 on 
the mRNA; see Tablet). 



Tablet. 



Designing Degenerate Oligonucleotides for the Isolation of 5' UTRs 






+4 


+3 


+2 


+1 


-1 


-2 


-3 




7-mer oligodeoxynucleotide position 


1 


2 


3 


4 


5 


6 


7 


7-mer oligo- 


A 


15 


0 


100 


0 


9 


11 


1 


deoxynucleotide 


C 


46 


100 


0 


0 


21 


13 


36 


frequency (%) 


G 


16 


0 


0 


0 


55 


49 


2 




T 


23 


0 


0 


100 


16 


27 


61 



[0030] Given that E. coll RNase H requires hybridization of four consecutive base pairs in order to recognize a DNA/ 
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RNA duplex region as a substrate (Donis-Keller, Nucleic Acids Res. 7: 1 79-1 92. 1 979), the controlled RNase H digestion 
using the above-described ollgodeoxy nucleotides will primarily hydrolyze the initiation codon, but because of the de- 
generacy of the oligodeoxynucleotide mixture, and the minimum consecutive number of base pairs required under 
physiological conditions, RNase H can also hydrolyze the RNA at many other locations, Including regions In the 5* 
UTR. To further restrict the digestion to the initiation codon, conditions can be modified such that RNase H recognition 
requires hybridization of more than five base pairs (see Example 1). The AUG sequence Is rare within the 5' UTR 
sequences (Kozak, Nucleic Adds Res. 15: 8125-48, 1987). Therefore, this RNase H digestion will preferentially result 
in intact, full-length 5' UTR sequences that are separated from the adjoining coding sequences. 
[0031] To enrich the population of 5' UTR-contalning fragments within the mRNA sample, fragments of up to 1000 
nucleotides are selected using denaturing agarose gels. The 5' UTRs of most vertebrate mRNAs fall within the size 
range of 20-100 nucleotides (Kozak, supra). Subsequent to size selection, the mRNA sample is subjected to affinity 
purification using a recombinant elF4E fusion protein that interacts with the mRNA 5* cap structure (Sonenberg and 
Gingras, Cun-. Opin. Cell Biol. 10: 268-75, 1998). 

[0032] An alternative strategy for isolating 5' UTRs from purified poly(A)+ RNA Is to reverse transcribe the poly(A)+ 
RNA using a degenerate {/.e., mixed-sequence) primer that hybridizes preferentially to the region surrounding and 
Including the initiation codon (the 3' border of the 5' UTR). 

[0033] The consensus sequence surrounding the initiation codon of vertebrate mRNAs is GCC (G/A)CC AUG G 
(SEQ ID NO: 1), where the underiined sequence is the initiation codon, and the nucleotides In parentheses arefound 
with neariy equal frequency at that position. 

[0034] A degenerate primer complementary to this consensus sequence can be designed that takes into account all 
the variations In frequency of the nucleotides at each position, so that the primer mixture has a high probability of 
hybridizing specifically to the initiation codon region. Table 2, below, shows that primers can be designed, based on 
the known sequences of hundreds of vertebrate mRNAs. 



Table 2. 



Designing Degenerate Primers for the Isolation of 5' UTRS 


mRNA 3' 


+ 4 


+ 3 


+ 2 


+ 1 


-1 


-2 


-3 


-4 


-5 


-6 


5' 


Primer 5' 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


3' 


























%A 


10 


0 


100 


0 


10 


10 


5 


10 


20 


20 




%C 


50 


100 


0 


0 


20 


15 


30 


15 


20 


45 




%G 


20 


0 


0 


0 


55 


50 


5 


50 


40 


20 




%T 


20 


0 


0 


100 


15 


25 


60 


25 


20 


15 




Data based on Kozak, Nucleic Acids Res. 16, 8125-8148, 1987 and Kozak, Gene 234: 


187-208 


1999. 



[0035] Referring to Table 2, the mixed-sequence primer reading 5' to 3' is complementary to the mRNA sequence 
surrounding the initiation codon. The numbering across the top from +4 through -6 con-esponds to the numbering for 
the mRNA sequence, where position +1 is the first nucleotide of the initiation codon, and all the negative numbers refer 
to nucleotides in the 5' UTR. The percentages refer to the frequency of occurrence of a given nucleotide at a given 
position. Therefore, the primer would be synthesized such that, for example, at position 5, A occurs 10% of the time, 
C occurs 20% of the time, G occurs 55% of the time, and T occurs 15% of the time. Note that positions 2, 3. and 4 are 
invariant as they are complementary to the initiation codon. AUG. It is expected that a degenerate primer of the above 
composition would hybridize preferentially to the region of the mRNA surrounding and including the initiation codon. 
[0036] Following RT-PCR to generate a minus strand cDNA hybridized to mRNA, the heteroduptex can be Isolated 
by affinity purification of the complex. The mRNA/cDNA hybrids are incubated with either a monoclonal antibody to 
the 5' cap or a cap-binding protein, for example, an elF4E protein attached to a solid matrix, washed and eluted to 
enrich for RNAs containing the full 5' UTR. Following elutlon of the complex, the RNA is digested with RNase H and 
terminal transferase is used to label the 3* end of the cDNA with poly d(T). Poly d{A) Is then be used to prime the 
second strand synthesis of the cDNA. The 5' UTR enriched library is then cloned 5' to the reporter gene. 
[0037] To construct the 3' UTR biased libraries, poly(A)4- RNA is digested, for example, using random primers and 
E. CO// RNase H, followed by selection of poly(A) -containing fragments using oligo(dT)-linked resin. The isolated poly 
(A)-contalning fragments are incubated with reverse transcriptase using oligo(dT) primers. Alternatively, to retrieve 
mRNA that is exclusively 3' UTR, isolated RNA is allowed to associate with ribosomes, for example, In lysates from 
rabbit reticulocytes. Under conditions In which ribosome run-off is inhibited by cycle hex imide, reverse transcription is 
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performed in the presence of oligo(dT) and a low efficiency polymerase (see Example 2). 

[0038] The purified 5* and 3' UTR RNA fragments are subjected to 5* RACE (Rapid Amplification of cDNA Ends) to 
obtain double-stranded cDNA (Frohman, In: PGR Protocols: A Guide to Methods and Applications, pages 28-38, Eds: 
Innis et al., Academic Press. London). The 5* or 3* UTR cDNAs are then llgated Into an expression vector of choice, 
for example, a retroviral vector. The 5' and 3' UTR sequences are positioned upstream or downstream, respectively, 
of a reporter gene's coding sequence. 

Screening Assay 

[0039] The expression vectors used for transfectlon of host ceils each encode one UTR, in operative linkage to a 
promoter, linked to its UTR-associated first reporter gene. The vector also includes a second, different reporter gene 
that is operabiy linked to a promoter, but is not associated with the UTR. Expression of this UTR-independent second 
reporter gene Is not regulated by any UTR effect. Thus, expression of this second reporter gene controls for differences 
in expression that result from variations in plasmid number or transcriptional efftelency. In addition, conditions can be 
varied to reduce the number of different vectors, and, thus, the number of UTRs, that are introduced into each ceil. To 
carry out host ceil transfectlon, conditions are adopted to limit transfectlon, preferably, to less than 5 plasmids per cell, 
most preferably, to one plasmid per cell. Usually, it is preferable to Identify conditions that allow nearly clonal delivery 
of the vectors to the cells. For retroviral transduction methods, cells are infected at a multiplicity of infection (MOI) such 
that each cell is infected with approximately one virus. The MOI can be determined empirically for each cell line and 
construct. Atternatively, plasmids can be delivered to cells via protoplast fusion (Tan and Frankel, Proc. Natl. Acad. 
Sci. 95:4247-52, 1998). For this method, £ coti are transfomned with plasmid libraries, the bacteria cell walls are 
removed and the resulting protoplasts are fused to mammalian cells with polyethylene glycol. By adjusting the ratio of 
protoplasts to mammalian cells, plasmid delivery is reported to be nearly clonal, with individual cells containing 1000 
copies of a single plasmid. 

[0040] The choice of cell type to be used will depend on several factors, for example, the biological system of interest 
and the ease of foreign DNA transfectlon. Thus, If the biological system of interest is breast cancer-related genes, a 
breast cancer cell line may be used. In addition, given that retroviral transduction may be the only efficient means'of 
transfectlon in some cell lines, use of these cells will not be preferred if another means of transfectlon Is desired. 
[0041 ] Expression of the UTR-assoclated reporter gene will be compared to expression of the non-UTR associated 
second reporter gene. Any discrepancies in this ratio of expression could reflect UTR-mediated changes in mRNA 
translation, export, or stability. Many potential schemes for detecting expression, and identifying expression-altering 
UTRs are available. Particularly well-suited systems are those that produce a colored or othenwise detectable product 
as detemiined by gel electrophoresis, detection of fluorescence, chemiluminescence, or antibody binding. For example, 
cells that express such UTRs can be identified and isolated using a fluorescence activated cell sorter (FACS) and 
green fluorescent protein (GFP) as a reporter gene (Biertiuizen et al., Biochem. Biophys. Res. Commun. 234: 371-375, 
1997; Grignani et al., Cancer Res. 58: 14-19, 1998; de Martin et al.. Gene Ther 4: 493-495, 1997; Foster et al., J. 
Virol. Methods 75: 151-60, 1998). Such a system Is advantageous for high throughput screening. Other systems that 
can be used to track gene expression Include detecting E. coli lacZ-encoded p-galactosidase activity coupled with a 
fluorogenic substrate (Flering et al. , Cytometry 1 2: 291 -301 . 1 991 ) and detecting the expression of foreign cell-surface 
antigens by means of fluoresce ntly-labeled antibodies (Planelles et al., Gene Ther. 2: 369-76, 1995). 
[0042] In the case of detection by fluorescence, the emission spectra of the fluorophores used to track expression 
of the UTR-assoclated first reporter genes and non-UTR associated second reporter genes must be sufficiently different 
so that, for example, the FACS Instrument can perform two-color analysis and sort cells on the basis of the correlation 
between expression of the two reporter genes. The transfected cell population will consist of four different expression 
pattems as follows: 1) cells that are negative for both gene markers, indcating transfectlon failure; 2) cells with a 
ratiometric relationship between expression of the UTR-linked gene and the control gene. Indicating that the UTR has 
no effect on gene expression; 3) cells with disproportionately higher expression of the UTR-linked gene, indicating that 
the UTR enhances translation efficiency or mRNA stability; and 4) cells with disproportionately lower UTR-linked gene 
expression, indicating that the UTR reduces translation efficiency or mRNA stability. 

[0043] Following FACS sorting, cells with skewed fluorescence signals can be collected for further analysis. The 
sequence of the expression-altering UTR can be determined using, for example. PGR with vector primers, or plasmid 
rescue. One fluorescent color readout is dependent upon levels of expression of the non UTR-linked second reporter 
gene and the other color Is dependent upon the levels of expression of the UTR-llnked gene. The FACS instrument Is 
capable of detemiining the levels of expression of both colors simultaneously and plots the two levels for each individual 
cell versus each other. It is expected that most UTRs will not affect gene expression and therefore, a majority of the 
transfected cells should express a consistently proportional level of both gene products. This population of cells will 
occupy a characteristic region of the two color plot. Cells that fall outside of this region will be automatically sorted into 
one of two tubes with UTR-linked genes that proportionally up-regulate gene expression in one tube and UTR-linked 
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genes that down-regulate gene expression in the other. 

[0044] A similar strategy can be used to screen and Identify compounds that affect the function of the 5* and 3' UTR 
regulatory elements. Compounds that modulate the UTR effect on gene expression would skew the expression of the 
UTR-linked gene as compared to gene expression In the absence of the compound. 

Example 1: Selective RNase H Digestion ofmRNA 

[0045] Conditions for digestion can be adopted that prevent RNase H hydrolysis unless mRNA hybridization to the 
oligodeoxynucleotide probe encompasses more than 5 or 6 consecutive nucleotides. This was demonstrated in an 
experiment in which a 7-mer oligodeoxynucleotide was designed to hybridize to a control mRNA species at multiple 
locations, but to fomi no more than five consecutive DNA/RNA base pairs at any one of these locations. No hydrolysis 
occurred using this oligodeoxynucleotide, but it did occur using a partially degenerate oligodeoxynucleotide, NNCATNN 
(where N is an equimolar mixture of A, C, G, and T) which allowed hybridization of 6 or 7 consecutive base pairs (see 
Fig. 1). Following denaturation of 0.2 ^g control RNA (Promega luciferase control sequence) and 70 pmol oligodeox- 
ynucleotide in 10 mM Tris HCI, pH 8.0, 50 mM NaCI, at 70* C for 10 minutes, samples were submerged in ice. RNase 
H, MgClg, and DTT were added to final concentrations of 0.4 units, 5 mM, and 1 mM, respectively Samples were 
Incubated at 20** C for 60 minutes. The reactions were tenninated by the addition of EDTA to a final concentration of 
25 mM, and digestion products were separated and visualized on a 1% TBE non-denaturing agarose gel stained with 
ethidium bromide. 

[0046] Conditions for RNase digestion can also be controlled such that a sequencespecifk: oligodeoxynucleotide 
7-mer will mediate RNase H-catalyzed hydrolysis of RNA only at the single site where seven consecutive DNA/RNA 
base pairs can fomi (see Fig. 2). These conditions included denaturing 0.2 ^ig control RNA (Promega luciferase control 
sequence) and 250 nmol oligodeoxynucleotide in 10 mM Tris HCI, pH 8.0, 50 mM NaCI, at 70 C for 10 minutes before 
submerging the samples in ice. Following the addition of RNase H, MgClg, and DTT, as described above, and Incubation 
at 20** C for 60 minutes, the digestion was terminated with the addition of EDTA to a final concentration of 25 mM. 
Digestion products were separated and visualized on a 6% polyacrylamide gel stained with ethidium bromide. 
[0047] A population of poly(A)+ RNA can be substituted for a control mRNA, and the poly(A)+ RNA can be partially 
hydrolyzed with a degenerate oligodeoxynucleotide, as shown In Fig. 3. Thus, under conditions that prevent formation 
of fewer than seven consecutive DNA/RNA base pairs for hydrolysis by RNase H, a partially degenerate oligodeoxy- 
nucleotide can be used in the reaction with poly(A)+ RNA, and the number of hydrolysis sites can still be limited, even 
after an extended incubation period. 

Example 2: Use of Ribosomes to Construct Full Length 3' UTRs 

[0048] Using reverse transcription and an oligo(dT) primer, a full length 3* UTR sequence can be copied to cDNA. 
Reverse transcription begins with the poly(A) region and proceeds upstream towards the 5' end of the 3' UTR. To 
temninate transcription at the coding sequence temnination site, the mRNA is fully loaded with actively translating ri- 
bosomes which cause steric hindrance of the transcriptase. Given that ribosomes do not bind mRNA downstream of 
the temnination codon, the reverse transcriptase proceeds unhindered to copy the entire 3* UTR sequence, but the 
activity of the reverse transcriptase is then terminated, effectively separating the full length 3' UTR from any upstream 
coding sequence (see Fig. 4). 

Other Embodiments 

[0049] All publications mentioned herein are hereby incorporated by reference. 



Claims 

1 . A cDNA library consisting essentially of at least 1 00 different cDNA sequences that con-espond to different mRNA 
untranslated region (UTR) sequences isolated and separate from adjacent mRNA coding sequences. 

2. A cDNA library, wherein said library is constructed by steps comprising 

a) purifying poly(A)+ RNA from total RNA; 

b) perfonning controlled, non-random enzymatic digestion of AUG sequences in the poly(A)+ RNA; 

c) purifying said digested RNA to obtain the fragments containing the 5' end sequences; and 

d) synthesizing cDNA from the purified RNA obtained in step (c); 
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wherein said library consists essentially of cDNA sequences con^esponding to mRNA 5' untranslated region 
(UTR) sequences, isolated and separate from adjacent mRNA coding sequences. 

3. The cDNA library of claim 2, wherein said enzymatic digestion is earned out using RNase H. 

4. A cDNA library, wherein said library is constructed by steps comprising 

a) purifying poiy(A)+ RNA from total RNA; 

b) synthesizing nucleic acid heteroduplexes from said poly(A)+ RNA using degenerate primers that hybridize 
to the region surrounding and including the initiation codon, said heteroduplexes comprising the 5' end se- 
quences of said RNA; 

c) purifying the heteroduplexes obtained In step (b) to obtain the fragments containing the 5' end sequences; 
and 

d) synthesizing cDNA from the purified heteroduplexes obtained in step (c); 

wherein said library consists essentially of cDNA sequences con-esponding to mRNA 5' untranslated (UTR) 
sequences, isolated and separate from adjacent mRNA coding sequences. 

5. The cDNA library of claim 2 or 4, wherein said 5' sequence purification is earned out using a cap binding protein. 

6. A cDNA library, wherein said library is constructed by the steps comprising 

a) purifying poly(A)+ RNA from total RNA; 

b) perfomning random digestion on the poly(A)+ RNA; 

c) purifying said digested RNA to obtain poIy(A) containing fragments; and 

d) synthesizing cDNA from the purified RNA obtained in step (c); 

wherein said library consists essentially of cDNA sequences corresponding to 3' UTR sequences. Isolated 
and separate from adjacent mRNA coding sequences. 

7. A cDNA library, wherein said library is constmcted by steps comprising 

a) purifying poly{A)+ RNA from total RNA; 

b) loading said poly{A)+ RNA with ribosomes; and 

c) perfomning reverse transcription on said loaded poly(A)+ RNA using an oligo(dr) primer and polymerase; 

wherein said library consists essentially of cDNA sequences con-esponding to 3' UTR sequences, Isolated 
and separate from adjacent mRNA coding sequences. 

8. The cDN A library of claim 1 ,2,4 wherein said cDNA sequences are cloned Into a vector system that can express 
said sequences. 

9. The cDNA library of claim 1 , 2, 4, 6 or 7 wherein said UTR sequences are isolated in intact forni. 

10. A vector comprising 

a) a nucleotide sequence encoding an mRNA UTR sequence in operative linkage to a promoter, wherein said 
nucleotide sequence is derived from the cDNA library of claim 1, 2, 4, 6 or 7; 

b) a first reporter gene positioned for transcription upstream or downstream of said UTR-encoding nucleotide 

sequence; and 

c) a second, different reporter gene in operative linkage to a promoter but unassoclated with said UTR-encod- 
ing nucleotide sequence. 

1 1 . The vector of claim 1 0 wherein said reporter genes encode a fluorescent protein or cell surface maricer protein. 

12. A method of identifying a regulatory UTR sequence, said method comprising 

a) transf acting a plurality of host cells with a plurality of vectors of claim 11 , wherein said host cells are trans- 
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fected with different UTR sequences; 

b) sorting cells on the basis of the ratio between expression of the first reporter gene and the second reporter 

gene; 

c) Identifying the cells of step (a) that have skewed expression ratios as compared to the population of cells 
of step (a) as a whole, or as compared to cells transfected with a vector that encodes the first and second 
reporter gene, but lacks the corresponding UTR sequence; and 

d) sequencing the UTR expressed In said identified cells. 

13. The method of claim 12 wherein said gene expression is detected by emission of fluorescence. 

14. The method of claim 13 wherein said cells are sorted by a fluorescence activated ceil sorter. 

15. A cell transfected with the vector of claim 10. 



EP1 176 196 A1 




O ^ 

2 ? firs- 



? 3> ^ 

1 €^ 



„ . e COP'' 
BEST WJN-^'- 



10 



EP1 176196A1 

BEST AVAILABLE COPY 



FIC. 2 




BEST AVAILABLE COPY 




EP1 176196A1 



YIG, 3 




12 



BiST AVAILABLE COPY 




BEST AVAILABLE COPY 



EP1 176 196 A1 




European Patent 
Offlca 



EUROPEAN SEARCH REPORT 



A|ipaeatlon Number 

EP ee 11 5854 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



CHatlon of document witt) indication, where appropriate, 
ot relevant passages 



Relevant 
to claim 



CLASSVICATIONOFTHE 
APPUCATIOM (mtCtT) 



kO 98 55562 A (SNITHKLINE BEECHAH CORP 
:BER6SMA DERK J (US); MOONEY JEFFREY L 
(US) IG Decenijer 1998 (1998-12-10) 

* page 4, line 25-29 * 

* page 6, line 17-39 * 

SUZUKI Y ET AL: "CONSTRUCTION AND 
CHARACTERIZATION OF A FULL LENGTH-ENRICHED 
AND A 5 '-END-ENRICHED CDNA LIBRARY' 
GENE: AN INTERNATIONAL JOURNAL ON GENES 
AND GENOMES,ELSEVIER SCIENCE PUBLISHERS, 
BARKING,GE. 

vol . 208. 1997. pages 149-156. XP882932531 
ISSN; 8378-1119 
the whole document * 

US 6 883 727 A (TAN RUOYING ET AL) 
4 July 2000 (2898-07-04) 

* column 1, line 60 - column 2, line 8; 
claims 11-14; figure 1 * 

WO 94 23041 A (RIBOGENE INC) 
13 October 1994 (1994-10-13) 
figure 6; examples 19-25 * 



1.4,5.8, 
9 



1,4.8.9 



C12N15/10 
C12N15/67 



1.8.9 



10-15 



T h e p m e rt se aw h r ep e rt hos be en d f o ww up fa f all g taiwa 



TECHNICAL HELOS 

pnLCt.T) 



C12N 



Place of eaai ch 

BERLIN 



Dole or cDfflpietjoit of the soarch 

12 July 2001 



Examiner 

ALCONADA RODRIG.. 



CATEGORY OF CfTED OOCUMEI^S 

X : partlcutarV relevant if taken atone 

Y : partioiarfy reievant comtinesi wtti another 

document of the same category 
A : technok>gicai bacJcground 
O : non written diGdosu.-e 
P : intermediate doaiment 



T : theory or prfncipte underlying (he Irrvertioo 
E : earier patent document, but put]fished on, a 

after the filing date 
0 : document dted in the application 
L : document dtsd for other reasons 



A : .Tfiniter oi the sarrte patent ^miy, corresponding 
document 



BEST AVAILABLE COPY 

14 





EP1 176 196 A1 




European Patent 
Otficd 



EP 80 II 5854 



AppJlCBtlon 



Number 



CLAIMS INCURRING FEES 



Th9 present European patent applicatfoo conipffeed at the time of fifing more than ten claims. 

□ Only part of the claims have Ijeen paid within the prescribed time limit The present European search 
report has been drawn up for the first ten claims and fbr those claims for which claims fees have 
been paid, namefy cfaim(s): 



□ No claims fees have been paid within the prescribed time limit The present European search report has 
been drawn up for the first ten dalms. . ^ 



LACK OF UNITY OF INVENTION 



The Search Division considers that the present European patent appiicalion does not comply with the 
requirements of unity of Invention and refates to several Inventions or groups of inventions, name^: 



see sheet B 



□ All further search fees have been paid within the fixed time limit. The present European search 
been drawn up for all claims. 



□ As all searchable daims could be searched without effort justifying an additional fee. the Search Division 
did not invite payment of any additional fee. 



□ Only part of the further search fees have been paid within the fixed time limit. The present European 
search report has been drawn up lor those parts of the European patent application wWch relate to the 
Inventions in respect of which search fees have been paid, nanmly ctaims: 



None of the further search fees have been paid within the fixed time limit. The present European search 
report has been drawn up for those parts of the European patent application which relate to the invention 
first mentioned fn the claims, namely claims: 



1,8-15 (partially) and 2,3,4,5 (complete) 



BEST AVAILABLE COPY 



EP 1176 196 A1 



European Patent 
Office 



LACK OF UNITY OF INVENTION 
SHEET B 



AppHeatlon Number 
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The Search Divisliwj oonsiders that the present European patent apsjftca^ does not comply with the 
requirements of unrty of hiveniton and relates to several inventions or groups of invenifans; ramlSv^ 

1, Claims: 1,8-15 (partially) and 2,3,4,5 (complete) 

A cDNA library consisting essentially of mRNA 5' 
untranslated regions and which is constructed by (1) 
purifying poly(A)^' RNA from total RHA; performing controlled 
non-random digestion of AU6 sequences In the poly(A)+ RNA- 
purifying said digested RNA to obtain fragments containing 
the 5' end sequences and synthesizing cDNA from the purified 
RNA or (11) purifying poly(A)+ RNA from total RNA, 
synthesizing nucleic acid heteroduplexes from said poly{A)+ 
RNA using degenerate primers, purifying said heteroduplexes- 
and synthesizing cDNA from the purified heteroduplexes- said 
cDNA library cloned In a vector system; said cDNA library 
wherein said UTR sequences are isolated in Intact form: a 
biclstronic vector con^rising an mRNA UTR In operative 
linkage to a promoter, wherein said nuclelotlde seouerice Is 
derived from the cDNA mr^iry, a first gene associated to 
said UTR and a second gene unassociated to said UTR* a 
method to identify a regulatory UTR sequence by uslna the 
biclstronic vector. 

2. Claims: 1,8-15 (partially) and 6 (complete) 

As Invention 1 but relating to a cDNA library which 
esentlally consists of mRNA 3' untranslated regions, said 
library being obtained by purifying poly(A)+ RNA from total 
RNA, performing random digestion on the poly(A)+ RNA- 
purifying said digested RNA to obtain poly(A) containing 
fragments and synthesizing cDNA from the purified RNA 
previously obtained. 

3. Claims:. 1,8-15 (partially) and 7 (complete) 

As invention 2 but relating to a cDNA library obtained by 
purifying poly{A)+ RNA from total RNA, loading said poly(A)+ 
RNA on ribosomes; performing reverse transcription on said 
poly(A)+ RNA using an ollgo(dT) primer. 
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